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ABSTRACT 

A  random  sample  of  size  N  is  divided  into  k  clusters  that  minimize 
the  within  cluster  sum  of  squares  locally.   The  asymptotic  properties  of 
this  k-means  method  (as  k  and  N  approach  °°  ) ,  as  a  procedure  for  generat- 
ing variable  cell  histograms,  are  presented.   In  one  dimension,   it  is 
established  that  the  k-means  clusters  are  such  that  within  cluster  sums 
of  squares  are  asymptotically  equal,  and  that  the  locally  optimal  solution 
approaches  the  population  globally  optimal  solution  under  certain  regular- 
ity conditions.   A  histogram  density  estimate  is  proposed,  and  is  shown  to 
be  uniformly  consistent  in  probability. 

KEY  WORDS:   K-means  clustering  algorithm;   Asymptotic  properties;   Within 
cluster  sum  of  squares;   Variable  cell  histograms;   Uniform  consistency  in 
probability. 


1.  INTRODUCTION 

Let  X  ,   X    •••,  ^M^^  observations  from  some  density  f  of  a  . 
probability  distribution  F.   To  estimate  the  univariate  density  f  using  the 
random  sample,   the  traditional  method  is  the  histogram.   The  asymptotic 
properties  of  the  fixed  cell  histogram  are  given  in  the  recent  text  by 
Tapia  and  Thompson  (1978).   Van  Ryzin  (1973)  first  proposed  a  variable 
cell  histogram  which  is  adaptive  to  the  underlying  density.   His  procedure 
is  related  to  the  nearest  neighbour  density  estimates  developed  by 
Loftsgaarden  and  Quensenberry  (1965).   In  this  paper,  it  is  proposed 
that  the  k-means  clustering  technique  can  be  regarded  as  a  practicable 
and  convenient  way  of  obtaining  variable  cell  histograms  in  one  or  more 
dimensions. 

Suppose  that  the  observations  X^ ,   X„,...,  X,,  are  partitioned  into 
k  groups  such  that  no  movement  of  an  observation  from  one  group  to  another 
will  reduce  the  within  group  sum  of  squares.   This  technique  for  division 
of  a  sample  into  k  groups  to  minimize  tlie  within  group  sum  of  squares 
locally  is  knouTi  in  the  clustering  literature  as  k-means.   In  one  di- 
mension, the  partition  will  be  specified  by  k-1  outpoints;   the  observa- 
tions lying  between  common  outpoints  are  in  the  same  group.   See  Hartigan 
(1975)  for  a  detailed  description  of  the  k-means  technique,  and  see 
Hartigan  and  Wong  (1979  a) for  an  efficient  computational  algorithm.   The 
asymptotic  properties  of  k-means  as  a  clustering  technique  (as  N  approaches 
oo  with  k  fixed)  have' been  studied  by  MacQu^en  (1967),  Hartigan  (1978),  and 
Pollard  (1979).   Here,  however,  the  large  sample  properties  of  k-means  (as 
k  and  N  approach  °°)  as  a  density  estimation  technique  are  presented. 
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The  asymptotic  properties  (as  k  ->■  °^)    of  the  population  k-means 
clusters  are  given  in  Theorem  1.   It  is  established  that  the  optimal 
population  partition  is  such  that  the  within  cluster  sums  of  squares  are 
asymptotically  equal,  and  that  the  sizes  of  the  cluster  intervals  are 
inversely  proportional  to  the  one-third  power  of  the  underlying  density 
at  the  midpoints  of  the  intervals.   Theorem  2  and  Theorem  3  give  the 
asymptotic  properties  (as  k  and  N  approach  ™)  of  the  locally  optimal  k- 
means  clusters  for  samples  from  the  uniform  [3,1 ]  density.   For  samples 
from  a  general  population  F,  the  asymptotic  results  are  given  in  Theorem 
4  and  Theorem  5.   It  is  shown  that  the  locally  optimal  solution  approaches 
the  population  globally  optimal  solution  under  certain  regularity  condi- 
tions.  In  Theorem  6  and  Corollary  7,  tv;o  proposed  estimates  are  sho^-m  to 
be  uniformly  consistent  in  probability.   T\i;o  empirical  examples  are  given 
in  Section  6  to  illustrate  the  performance  of  one  of  the  estimates. 
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2.   SOME  DEFINITIONS 


Let   {X^,}   bo  a  sequence  of  random  variables,  and  let   {a.,}   be  a 

N  N 

sequence  of  real  constants  and   ^^'fj)  be  a  sequence  of  positive  constants. 

1.  The  notation   a,,  =  0(b,,)   lueans   liri  |a,  l/b,,  <  co  . 

N      N  ^,    '  N'   N 

(If  a^  =  O(b^)  ,   b^^  =  0(a^,)  ,   we  say   a^,   is  of  order  b   .) 

2.  X,,  =  0  (b^,)   means  that  for  each   e  >  0  ,   there  exists  a.  real 

N    p   N 

constant  c(c)   and  an  I'^V(c)   such  that 

Pr{-r —  >  c(e)}  <  c  for  all   N  >  N  (c)  . 

N 

a 

3-   a,,  =  o  (b„)   means  lini  :; —  =  0  . 
N      N  .,    b., 

4'   X,,  -  o  (b^,)   means  that  for  each   e  >  0  , 
N    p  N  ' 

Pr^— . >    c]   ->  0  as  N  -+■  CO  . 

5-   For  a  real  sequence   {a.^,}   and  a  positive  real  sequence   {b...}  , 
^         iN         '^  ^       '  xh      ' 

where'  1  5  i  5  k„  ,   we  say  a_,  =  0(b.^,)   if 


lim  sup    ([a  ,|/b  ,)  <  <^   ;   if  the  double  sequence  is  considered 
N 

as  the  single  sequence  a    .,.,  a,     a    ,.,,  a,  2'  •'•>  definitions 

1  and  5  coincide. 

The  sjiiibol   f    (x)   v.'ill  be  used  to  denote  the  ith  derivative  of   f   at  x  . 


-  3  - 


3.      ASYMPTOTIC  PROPERTIES   OF  OPTIMAL 
POPULATION  K-ME.VNS   CLUSTERS 

Let      f(x)      be   a   density   function  defined    on    the   interval      [a,b]    . 

Suppose   that      [a,b]      is    to  be   partitioned    into      k      clusters    (or 
intervals)    so   that    the  within  cluster   sum   of    squares   of    this  optimal 
k-partition   is    the  mininum  over   all    possible   k-partitions. 
Tneoreu  1:      If    f(x)    >    0   for   all  x   c    [a,b  J,    and   f(x)    is   continuous 
together  with   its   first   four   derivatives   on    [a,bj,    then  we  have 
uniformly   in   1  <_  i<_k, 

•     k   c,   f.    '^'-/^    [f(x)]^^'   dx  •  (3.1) 

ik      ik  a 

kp.      f.    -'^^  wNf(x)]'/^dx  (3.2) 

^ik      ik  a 

k3   WSS.,    -.   [/  ^    [f(:0]^^^    dx]Vl2 
ik  a 

as     k  -»■  "   , 

where  e        =  length   of    the   ith   interval    in   the   optL-nal      k     partition, 

ik 

f        =  density   at  mid-point   of    ith   interval, 
ik 

p        =  area   under      f      inside    tlie   ith   interval, 
ik 

ySS        =  within-cluster    sum   of   squares   of    the   ith   interval, 
ik 

(The   theoreni   states   that,    for   large     k   ,      the  within  cluster   sums   of 

squares  are  nearly  equal;  it  follows  that  the  length  of  the  interval  con- 

-1/3 
taining  a  point  x   of  density   f(x)   is  proportional  to   f(x)      .) 

Proof:   Tne  proof  is  in  tour  parts. 

[I]    Tlie  k-partition.  of   [a,b]   consisting  of  k  equal  inten/als  has  a 

vithin  cluster  sum  of  squares  of  order   k'Z  ;   the  contributions 


-  A 


from    the    ith    intcrvnl    to    the   optL-:ial  within   cluster   sum   of    squares 

is   of   order      e.,  ^    •      Therefore,      e.,     =  0(k  )    .      To   avoid   co;a- 

iiC  lie 

plexity  of   notation,    the  k's   indexing  partitions  v.'ill   be  dropped. 

[II]         Suppose    that      ^   ~  Yn   "^    ^i    ^    •''    ^^^1.1    '^   ^1.    ~  ^      ^^^    ^^^   outpoints 
of   the  optimal   k-partition.      Then     y     =   a  +  E  c        (i=l,...,k)    . 

Denote   tlie  center   of    the   ith   interval   by      c.      (i=^l,...,k)    . 

It   follows    that     c.=y+— r... 

1        ^1-1        2      1 

Let     m        be    the  mean   of    the   ith   interval.      That   is, 
1 

^i  ^i 

m.    =  /  X  f(x)dx//   "-        f(x)dx    . 

.^         ^i-l  ^i-l 

Consider   any   two   neighbouring   intervals      e.      and.    £ . , ,       (1   S  j   S  k  -   1) 
By   the   optirnality  of    the   partition,      y.    -m,    =in.,,    -y.    . 

Thus,      e  .   >  y  .   -  m  . 
J  J  J 

=  ra.,,    -  y. 

y .  y  • 

=  /   ^"^^   x   f(x)dx/;   ^'^^    f(x)dx   -  y. 

y  •  y  •  J 

J  J 

=  Z^^"*"^    X   f(x+y.)dx//^^"*"^    f(x+yjdx 

>i^|lc  ,  where     M^=      inf      f  (x) 

u  a5x5b 

and  M     =      sup      f(x)    . 

"        aSx<b 

l^^l  ' 


Similarly,      e  .  .  ,    5  ^  77-  c  . 


[III]      Let  us  now  establish   the  asymptotic   relationship  betv;een   the  lengths 
of  neighbouring   intervals.      Using   the  Taylor   series   expansion,   we 
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have,  for  any  x   in  Lhe  ith  intoi-val,   f(x)  =  f(c.)  +  (x  -  c.) 


1 
4 


f^^\c    )    H-  -kx   -   c    )2f^')(c    )    +  i(x    -  c.)3f(3)^^^    ^     1  ^^   _  ^    ^ 
1/  1  lb  1  l/H  1 

(4) 
f        (5    )      where     ^        is  between      x      and      c.     . 
X  X  i 

Since   the   first   four   derivatives   of      f      are  bounded    on      [a,b]    ,      it   follov;s 

from    the   above   series   expansion   tliat  v.-e  have   simultaneously   for   all 

1   <   i  5  k   , 


y  . 


p.    =   /y_         f(x)dx   =   c.[f(c.)    +  A_  f^''^(c.)c.2   -H  OCc."^)] 


(3. A) 


and  /  -^       X  f(x)dx  =   c.[c.f(c.)    +  i  f^^^c.)£:   2  (3.5) 


/  "•       X  f(x)dx  =   c,[c.f(c.)    +  -L  [^^\c.)e. 


(Note  that  the  universal  bound  contained  in  the  0  term,  which  is  independent 
of  i  from  definition  5,  depends  on  the  various  bounds  of  the  derivatives  of 
f.)   Tnerefore, 

Since  the  partition  is  optiir.al,  we  have  simultaneously  for  all   1  S  i  5  k 

y.-m.=m.,,-y., 
1    1    1+1   •'1  ' 

which  when  combined  with  (3.6)  gives 

1  3  +i 

Thus  fron  [II],  we  have  for  all   I  5  i  S  k  , 

It  follows  that 
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(. 


"  ^i(  -  67^77  <f'"(=i>^i  +  f^"<=i.n)  ^i+i)  -^  o(..2)l  , 


since     f(c^)   =    f(Cj^j)   +  0(c)    , 
-1/3 


=.  cAll  +  Y^. — ^  (r^^hr  ^.    4-  f(^>^.       N_       X1-1/3 


=     E, 


=     C, 


f(c^) 


+  0(c^2)^ 


^(^i.-l> 


f(c..) 


[1   +  0(e^2)]    ^      3^^^^    (.)      ^(y^j    ^   ^^^^    ^ 


U(l) 


jf        (c.)c.    +  0(e.2)    ,      and    (ii)      f(y.)    =   f(c.^p    "I^         (^i+l)^i 


+1 


0(e.2)    . 

EquivalenUy,    we  have 

^""Vc.    =   [f(c.^^)/f(c.)]-   '^'    [1   +  o(e.2)]    . 

fro.  (3.4),  p,  =  f(c.)  c.  [1  +  0  (  e.2)].   and  it  can  be  shown 
from  (3.4),  (3.5),  and  (3.6)  that 

y. 

Hence,    from     (3.7),    we  obtain, 

Pi+l^Pi    =    ff(-i+i)/f(V]'^'[l    +  0(e.2).]    , 


(3.7) 


and 


WSS^_^^/WSS^   =   1   +  0(c.2) 


(3.<J; 


(3.9; 
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[IV]        Let   us   nov7  est'blish    the   relaticnrhi  p   betv.-ccn     £        and      e        fo" 
any      1   5   i  <  j   5  k    .      It   folloi-.-s     froni    (3.7)    that   for   any   pair,  of 
values   of      l5i<j5k, 

•••     [1  ^■  0(c.-)]}    . 

-2/3 
But   from      [I],       sup   e.    =  0(k  )    . 

i        ^ 
Thus     e./c.    -    [f(c.)/f(c.)]"^^^    [1   +  0(k"''^^]^ 

=   [f(c^)/f(c.)]"^^^    [1   +  0(k'^''^]      for    all      1   5   i   <   j   £  k    , 
which   implies    that  v;e  have  uniformly   in      l5i<j<k, 
(e./e.)    •    [f(c.)/f(c.)]^''^  -^1  as  k  -^  -   . 

Since        Z      e.    f(c.)^''^  ->  /^^   f(::)'^^dx    ,    (3.1)    loIIo^s. 
t      i  i  o. 

1=1 

Similarly,  from    (3.8)    and    (3.9),   we  have   uniformly   in   l£i<jlk   , 

(p./p.)    •    [f(c.)/f(c.)]~^^^   -^  1    ,  and 

1      J  1  J 

USS./VJSS.   ->   1 

1       J 

as     k  ->  "   ,      v;hich    in    turn   give  (3. 2)    and    (3.3).      And   the   theorem  is   proved. 

Nej:t,    we  will    examine    the   as^miptotic   properties    (as      k     and      N 
approach     ")      of    the  locally   optimal  sample  k-means   clusters. 
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4.   ASi'MPTOTIC  PROPERTIES  OF  LOCALLY 
OPTIMAL  K-ME/iNS  CLUSTERS 

4.1.   The  Uniform  Case 

Let  X,  >   x„  >  ....,  Xv,  be  a  random  sample  from  the  uniform 
1    2         N 

distribution  on  [O,  l].   Suppose  that  the  N  observations  are  grouped 
into  k»,  clusters  so  that  the  within  cluster  sum  of  squares  of  this 
locally  optimal  k  -partition   cannot  be  decreased  by  moving  any 
single  point  from  its  present  cluster  to  any  other  cluster.   There  may 
be  many  local  optima;   Theorem  2  shows  that  they  all  converge  to  the 
globally  optimal  partition  of  the  population.  (Denote  the  jth  order 
statistic  by  x   ,.  If  x.   ,,   x,^  +i  ^ »  ••••>  x,  '_-\\'    ^(i    )      ^^^  ^^^ 

-J  "J 

observations  in  the  jth  cluster,  then  the  length  and  the  midpoint  of 
the  jth  cluster  are  defined  to  be  [x  .-■   ^  ,x-  x  ,^  -    v)   ^""^  '^  ^'^  (1   +1)^^11  -1)"' 
respectively,  where  ^(n)~  ^     ^^'^    ^(\-'-l)^  ^    '^ 

To  determine  the  ratio  of  the  lengths  of  two  adjacent  clusters, 
we  need  to  use  the  means  of  the  observations  in  the  clusters  to  locate 
accurately  the  midpoints  of  the  clusters.   A  theorem  of  large  deviations 
due  to  Feller  can  be  used  to  prove  that  the  cluster  means  are  suitably 
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close   to    the  midpoints. 

^6"'"^^   1    (Feller's    theorem  of   large   deviations;      see      Feller      (1971, 
pp.    5A9-553)    for    proof.): 


Let     X    ,    X        ...,    X        be   a  randooi   sar.iple   from  a   coiumon  distribution     F 


such    that      E(X   )    =  0    ,      E(X   2)    =  ^"2-    ^ 


N 


1=1 


Let     G      (.)         stand   for    the  distribution   of    the   normalized   sura     E._,x./a»^ 
and      1    -     S(y^,)=    (27:)-'/'   y   "1    exp(-  i  v    2)    . 


Then  provided    that 

(i)        the   characteristic    function  of      F      is   analytic   in  a  neighbourhood 

of    the   origin, 
(ii)      y       varies  v;ith      N      in  such   a  way    that      Yv,  "*■  "     ^ri<^      ytj^""  "*"  °° 

as     N  ->-  ~   ,      we  have      [1    -G      (yj^)   ]/[!    -    S(y   )    ]   ->   1      as      N  ->■  <°   . 

LeiiLTia  2  ; 

Let      z   ,    z    ,    ...      be   i.i.d.    random  variables   unifonaly   distributed   on 

[a,b]    . 

Let     y    =    (a  +  b)/2      and      h   =    (b   -   a)    .       Put      a,,  =  log   N/N    . 

N 

Then   there  exist  constants      C,    D,    and      N        such    that   if      N  ^   N        and 

o.  o 

n>  N:xj^^'''/16, 


Pr{ 


1-1    1/2 
h    ^n 


z,+7.„4--  •  •+z 
12  n 


n 


-    V 


>    C(log   Y,)^''^)    <  DN-2(log   N)    ^'"^ 


Proof; 

Now     E[z.    -  y]    =0      and      E[(z.    -u)2]    =  h2/12    . 

i'         -1/5 
Tnerefore   oyLeiniaa   1,    sinr*-      (A    log  N)^       n  -^  0    as   n  ->  co 
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we  have 

I>r{v^h-'    n~'^^   X    (z.    -  u)    -    (/.   log  iO^''^}/(2n)~^''^ 
1        ^ 

•    (4   log  N)~^'^   N-2  ->  1      as      n  -> 


N 


And    the   ler.iraa   follows. 

(It  v;ill   be   shovm  later   that,    when      N      is   large   enough,    all   of    the     k 
clusters   contain   at  least      Na      '    /16      observations  with   7)robability 
tending   to   one.) 


In    tlie   application   of   Lenma  2,    n  will   be      the  number      of   observa- 
tions  in  an   interval   of  length     h    ;      so      n      is   approxiraately      Nh    .      This 
is  made  more  precise   in  Lenma   3,      which   is   a  direct   consequence      of 
Donsker's    theorem   for   empirical    processes  '(^ce  Billingsley    (1968),    p. 
141).      Together  with  Lemma   2,    Lemma   3      gives   a   uniform  estimate     of 
the  deviations  betv/een  cluster  means  and  midpoints   for    those   clusters   that 
are  long  enough.      The  main  difficulty   to  be   overcome   in   the  proof   of 
Theorem      2     is   showing   that   all    the   cluster   intervals   arc  long  enough. 
Lemma    3: 

Let     X   ,    X        ...,    X       be  a  random   sample   from   the  uniform  density  on      [0,1]    . 
Denote   the  length   of   an   interval      I     by      s^    . 


I'     ■    i  -  P- 

tions  in   I  and  the   sup   is  taken  over  all  open  subintervals   I   of 


Then   sup  n^  -  NS,  =  0  (N   )  ,   where  n^   is  the  number  of  observa- 
■   '  I     ■^'     p  1 


10,1]  . 

Lemma  4  : 

Let     X   ,    X    ,    ....   x«,     be   a   random   sample   from   the  uniform  distribution  on 

[0,1]    . 

llien  there  exists  a  constant  C^   such  that 
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Pr{sup   s      ^''^    |x      -   u    I    <  C    r,y-}    =    1    _   o(l)    , 

^  i.  i  J-  Ok 

where     x      =  mean   of   observations   in      I    ,      V-r   ~  midpoint   of      I   , 

s     =  length   of      I    ,      and    tlie      sup      is    taken   over   all   op^n    intervals      I 

(whose  boundary    points   are   order    statistics)    containing   at   least 

Nu^,        /lb      observations. 

N 

Proof  : 


For  any     N  >   N      ,      consider   an   interval      I      of    the   form      (x,    s,    x,  ,)    . 

■  o    '  (ra)'       (lu+n  -fl)^    ' 

1/3  ^ 

v/herc     n     >   Na  /16    .      Using  Lemma  2      (first   conditioning  of   the   two 

order   statistics   and    then   integrating   out),    v;e   obtain 

1/2     _   _i     I-  I ,,         „sl/2, -2,,  „. -1/2 


Pr 


{n^.'^^    s^-1     |x^  -  p^l    >  Cdog  N)^''^}    <  DM~^(log  N)    ^^' 


Now,    from  Lemma      3    ,    ,      "t   -    ^s   /2     with   probability    tending    to   one. 
Therefore,      Pr{s^"^^^    [x^    -  M^l    >   »^  ^^k^^^    -   ^"~^"    ^^°S  N)~^^^    . 
Since   the   number   of   possible   intervals      1      is   bounded   by      N^    ,      we  have 

Pr{sup  s^^^-    |xj   -  M^l    <  C^a^^^^^    >    1    -  D(log   K)"^''^ 

where      q     =   /2C      and    the      sup      is    taken  over   all   intervals   of    the   form 

1  /3 

(^/    \y    '^/   ^     _lin)      "^^'1      ^r   -   ^'^VT        /16    •      The   lemma   follov:s. 
(m)'       (m+n  +1)  IN 

Theorem  ■  2    :      Let     x   ,    x        ...,    x  >^    be  a   random   sample    froiTi    the  uniform 

distribution  on      [0,1]    .      Let      e . (N)    (j=l,    2,     ...,    k^J      be    the   length   of 

J  '" 

the  j  th  cluster  of  a  locally  optimal  k  -partition.   Let   a   =  lof'  N/N  . 

N  N  ° 

Then,    provided    that      k        increases   v,T.th      N      in  such   a  v.'ay    that 

1/3 
k.,a.,  ->  0      as     N  ->  «"   ,      we  have 

N  N  ' 

max        Ik      c    (N)    -   l]    =   o    (1)    . 
l<i<k  ^^      J  P 

Proof  : 

Consider   a   locally   optiiial   k^.-partition  with      k      =  o(u   ~        )    .      The   proof 

i^  N       N 

is  in  throe^parts.   In  part  I,  it  is  shown  that  if  a  cluster  is  of  length 
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>  1/(21;  )  ,   then  both  it  and  its  ncip,hbouring  c.l\istcrs  contain  at  least 
Na^'  /16   observations.   In  part  II,  using  the  result  of  Lemma  ^  ,  the 
relationship  bctWLien  the  lengths  of  neighbouring  clusters  is  established; 
a  bound  of  their  ratio  is  given  by   1  +  k^.~''o  (1)  .   Since  the  length  of 

li  p 

the  largest  cluster  is  2;  1/k   ,   applying  parts  I  and  II  repeatedly  gives 

the  result  of  this  theorem. 

To  avoid  wordiness,  statements  are  to  read  as  if  they  included  the 

qualification:   "with  probability  tending'  to  one  as   N   approaches 

infinity". 

[I]  Suppose   that    the  jth   cluster   is   of   length     S   l/(2k   )    . 

By  Lemma    3,      it   contains  at   least  N/2k     -   0   (N^'^)      observations. 

Thus,    the  number   of   observations   in   the  jth   cluster   exceeds      N/4k„   . 

N 

But      k„  =  o(a,~^'^)    ,      therefore    this  number     >   Na^,^''^/4    . 
N  N  N 

Using  Lemma  4  ,      we  have 

|x.    -y.l    <  C'a^l/2    c.1/2  (4.1) 

-        3  3  N  3 

where  x   =  mean  of  observations  in  the  jth  cluster, 
3 

and    \i  .    =  midpoint  of  the  jth  cluster. 

Consider  the   (j-l)st  cluster,  a  cluster  adjacent  to  the  jth  cluster. 

Let  y  be  the  largest  observation  in  the  (j-l)st  cluster  and   z  be  the 

smallest  observation  in  the  jth  cluster.    Then  by  local  optinality,   the 

midpoint  q   between  x.  ,  =   x.   must  lie  between  y  and  z.    And 
^  J-1     J 

'j-1  -  q  -  X.  ,   =  X  .  -  q  =  X.-  y  -  0  (ex  )  =  (x.  -;i.)+  ht .-   0  (a  ) 
since  the  largest  gap  between  successive  order  statistics  is  0   (c'j^)  • 
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G  , 


Fruni    (i;.!).      v;e   obtain 


^j-i^^j-  l^^j-^jl  -%^v 


>  4:.    -  C'a    l/2c.l/2    _   0    (a,;) 


>    l/(Skj^.)    ,      since      z.   >    l/(2kj^j) 


Thcrciforc,    by  Lemma      3    ,        the    (j-l)Kt    interval   contains   at   least 
N/(8kj^,)    -   0    (Nl/2)    >   Naj^l/3/15      observations    eventually. 

[II]        Now,    applying  Lenma  4      to    the    (j    -   1)    st   cluster,      we  have 

Ix.    ,    -  u         I    <  C'a    1/2^1/2  (A    -.) 

Since      q   -  x.    ,    =  x.    -    q    ,      we  have 

S-1   ■"  Tj-l    -  %(°N^>    -  ^j-I   =  ^j    -   (^j    -  i-j    +  %(«i,))    •  (4.3) 

Comb  im'  n*''     r  f     t\  r  r    nA       „^j    /-/.    on         _„    ,i.  .._• 

5  0p(al/2) 

5   e72    ,  since      e.   >    l/(2kj^)    >   aj^l/3/2    . 

13 

Hence     ^rc .   5  e  .    ,    <  ^ .    . 
2   J  J-1        2  J 

Therefore     (4.4)    can  be  v/ritten   as 

l^j-l/^j    -   M    ^    2C-.,l/2,.-./2   ^   2/|c.a,^/2,.-l/2   ^   2^.-1    0^(a,) 

=  V'  %^^^  '  ■  ("^-^^ 

1/3 
since  k,,a     =  o  (1)  ;   and  this  bound  does  not  depend   on  the 

intervals  involved. 
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[Ill]   Let   c    and   e   be  the  lcnp,Lh  of  tiie  largest  and  t'.ie  smallest 

cluster  respectively.   Then   c,  5  1/k^  >  ]/(2k,,)  .   Thuj;,  from 

1       N         i\ 

(4.5),  by  carrying  out  at  most  1<^  comparisons  of  adjacent  clusters, 
v;e  obtain 

e^/c^   -  [1  +  k^-^  o  (1)]  ^'  =  1  +  o  (1)  .  (4.6) 

But  for  each   1  =  1,  .  .  .  ,  k.,  ,   e,  >  c  .  >  e 
-^   '     '   N  '    1    J    s 

Therefore  from  (4.6),  we  have  for  all  1  5  j  5  k   , 

e   (1  +  o  (1))  >  e.  >  e   . 
s       p        J    s 

Suinming   over      j    ,      we   obtain 

\ 

Nov;,    since   the   e.'s   cover    the   interval      [0,1]      with   at  most     k 


(4.7) 


N 


overlaps  of  length   0  (a,,)  , 

p     N 

Z    e.    =   1   +  k^    0   (a J    =    1   +  o    (1)    . 
I       3  N      p      N  p 

Substituting  in    (4,7),      we   have 

Similarly,      e.,    =  k  "^    (1   +  o    (1))      and    the   theorem  is   proved. 
IN  p  ' 

Next,    we  will   shovj   that    the  within  cluster   sum   of   squares   of   the     k 
clusters   are  asymptotically  equal.      First,    Feller's    theorem  on  large 
deviations   is   used    to  obtain   a  uniform  estimate   of    the   within   cluster   sum 
of    squares,    which    is   a    function   of    the   1-ength    of    the   cluster    interval. 
Then  using  Theorem     2,      the   result    (Theorem   3)    follows. 
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(Let   X  ,  X  ,  ...,  X   bo  a  set  of  obseivat  j  ons.   Tlie  within  cluster  sum 

of  squares  of  this  set  of  observations  is  defined  to  be   I,  (x   -  x)''- 

where  x   is  the  r.can  of  the  observatious. ) 

Lemma  5: 

Let      z    ,    z    ,    ...      be   i.i.d.    random  variables  uniformly  distributed   on 

[a,b]    .      Let      u    =    (a+b)/2      and      h   =    (b-a)    .      Fut      a      =  log  N/N    .      Then 

there   exist   constants      C,    B'      and      N    '      such    that    if      N  S   N   '      and 

o  o 

n  >   V.aJ^'^/lb    , 
N  ' 

n  . 

Pr{h-2n-l/2    \l(z.    -7)2    __Lnh2|    >    C'(log   N)^/2} 
1      ^ 

<  D'N-2(log  N)-l/2    ^ 
Proof; 
Now,      E[(z.    -  u)^   -  tV'']    =  0      and      Var[(z.    -  v.)2    _  -Lh2]    =      1   h'*    ^ 

(A     T  nn    ^^^l/2„-l/G    _v    n 


TIiiip;      hv  T  pmnip  1  .      since 


:' £     rt  ->  CO  ^._To  have 


n 
Pr{/r80   h-2n--l/2    v[(z_^   _  ^,)2    _  JLh?]    >    (4    log  N) ' /2} /(2^)-l/2 


(''i   log' N)"^ ''2x^-2  ^  I      ag     n  -)- 


n  n 


But     T(z.    -  y)2    =   E(z.    -   z)2   +  n(z  -  p)2    ^      therefore   as      n  -^  °°   , 


n 

Pr{/r80   h-2n-l/2    j;[(2_    _  7)2    _     1^2]    >    (4    log    j^)l/2 
J         1  12  . 

^-  /ISO   h-2nl/2     |7  _   y|2}/(2^)-l/2(4    ^og    N)-^/2j;-2   _,    ^ 
By  Lemma     2,      t'n-i   lemma   follows. 

Lemma  6 : 


Le 


t  '  X,  ,  x„,  ...,  X   be  a  random  sample  from  the  uniform  distribution 
1    2        N  ' 


on 


[0,1]  .   llien  there  exists  a  constant  C  '   such  that 
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Pr{sup  s-5/2    |;.;ss      _i-Ns3|    <   q    'Nct,y2}    =    ^    _  ^(i) 
_1  '11/1'  oN 

where     WSS      =  within  cluster    sum   of    squ.ires   of    the   observations   in   .  I 

s      =  length   of      I    ,      and    the      sup      is    taken   over   all   open   intervals      I 

(whose  boundary  points   are  order   statistics)    containing   at  least      Na   ^f^/lb 

N 

observations. 
Proof: 


For  any  N  ^  N  '     consider  an  interval   I   o'f  the  form   (x,  ^.  x,      ,) 

o      '  (m)         (m+n  +1)'^    ' 

where     n     >  Ka^'^/16    .      Using  Lemma  5    (first   conditioning  on   the   tvro 
order   statistics  and    then   integrating  out),    we   obtain 

Pr{s^-2nj-l/2    |v;sSj   "A  "^1^1^'    -   C' (log  N)^/^}    <   D'N-2(log   N)-l/2^ 

Now,    from  Lemma    3  ,  I n^    -  Ks^l    =   0    (N^'2) 

I  I '  p 

therefore,      n     5   2Ns        with   probability   tending   to  one,    and 

Pr{s^-5/2    [wsSj    -  Y2''^l^l    -    ^'^'^'    Nl/2(iog  K)'/2j    <   D'N-2(log   N)-l/2    ^ 
Since   the  number   of   possible   intervals      1      is  bounded   by      N^    ,      we  have 
Pr{sup   s^-5/2    \\^SS^   -  3^^'s^3|    <   c^'Naj^^/2}   >    i-_  D'(log  N)-l/2    ^ 

vAiere     C    '    =   2/2C'      and    the      sup      is   taken   over   all    intervals   of    the   form 
o  ^ 

(x,   .,    X,   ^      ,, .)      with      n^   >   mjl'^/lb    . 
(m)        v.ni+rL.+l )  IN 

The  lemma    follows. 
Theorem    3: 


Let     X.,   x_,    . .  .  ,   x^,     be  a   random    sample   from   the  uniform  distribution 
1        2  N  '^ 


on 


[0,1]    .      Let     WSS . (N)       (j=l,    2,    ...,    N)      be    the  within  cluster   sum   of   squares 
of    the   jth   cluster   of   a   locally   optimal-k    -partition.      Let      a      =  log  N/N   .     • 
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l/3 
Then   provided    tliat      k,,      increases   with      N      in   such   a   way   that     k  a  ->-  0 

N  N   N 

as      N  -^>-  "^    ,      we  have 

max         Il2N~M-.    3   V.'SS  .  (N)    -   ll    =   o    (1)     . 
l<i<k  ^  ^  P 

Proof  : 

_1  /3 
Consider   a  locally   optimal   k   -partition  with     k      =   o(a        '    )    . 

It   is   sho;.m   in  Theorpra     2      that   for    all      N      large   enough  v/ith   probability 

tending   to   one, 


1/3 

1  .      the  number   of   observations    in   each   cluster      >   Na.,        /16    ,      and 

N 

2.      c.(N)    =  k^-1    (1   +  o   (D)      for   all      j=l,    2,    ...,   k^   . 

From    Cl)>    ^-'e   can   apply  Lemma   6    to   obtain 

3-       |WSS.(N)    -  -r-V'Ic.(N)3|    ^   C    'Net,,        £.(N)''         uniformly    in      1   5    i    <  k,    , 
'j  i2joNj  N 

Combining   (2)    and    (3),    we  have   uniformly   in      1   2   j   5  k 

Iwss.(N)  -  —Nk  -3  n  +  o  rn^i  <  r  '^'-J^^v'^^^  (i  +  o  (i))  . 

Therefore,   ll2N-lk„3  WSS.(N)  -  ll  5  o  (1)  +  C  •ct^/^^k,/''^  (1  +  o  (1)) 
'       Nj        'p       oNN  p 

uniformly  in   1  5  j  S  k   .   And  the  theorem  is  proved . 
(Remark:   Since  the  global  optimum  is  necessarily  locally  optimal,  the 
results  of  Theorem  2  and  Theorem  3  also  apply  to  the  globally  optimal  \- 
partition) . 

A. 2   The  General  Case 

For  samples  from  a  general  distribution  F,   the  results  analogous 
to  Theorem  2  and  Theorem  3  are  given,  respectively  in  Tlieorem  4  and 
Theorem  5.   The  proofs  proceed  in i the  same  way  as  before. 

Lemma  7' 

Let  z  ,  z  ,  ...   be  i.i.d.   random  variables  from  some  distribution  with 
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finite  variance     o^    . 

Put      a      =   log  V,/]<      and    let      E(-    )    =  u    . 

Tlien    tlicrc   exist   constants      C,    D,      and      M        such    that    if      MSN        and 

o  o 

n  >   Ka„'/'716   , 
N 


f  -1       }/-> 


Tr /a~'-    n 


z   +z    +•  .  -+7. 
].      ^  n 


-  v 


.OS   10 -'^^      ^   DN--(log   N)-^-^^ 


on 


"i    C(lo"    v^-'-^    <   n\;-:^/'i^,>   >'\-l/2 
(The   proof   is   similar    to    that   of   Lenup.a  2.) 
Lemnia    3: 

Let     X    ,    X    ,    ...,    x^^     be   a  randoni   sample   from   a   distribution     F' 
[a,b]    . 

Denote      /      dF'      by     F'(I)    . 
Then 

sup    |n      -   KF'(1)1     =   0    (X^/2)    ^ 

where     n        is   the  nuuiber   of   observations   in     I    , 

and    the      sup      is    taken   over   all   open   sub  intervals      1      of      [a,b]    . 

(Like  Lemma    3,      this   lemma   is   a  direct   consaquence  of   Donsker's   theorem 

for   empirical   processes.) 

Let   F  be   a   distribution   on    [O,    l]   vith   tne    following   properties: 

1.  the  density      f      and    its   first     tv70    derivatives   are   continuous   on 

[0,1]    ; 

2.  f(x)    >   0      for   all      x   €    [0,1]    . 

Lemma  9 : 

Let     X,,    x„,    ,  ,  .  ,   X.,     be   a  random   sample   from      F   . 
1        2  N 

Denote    the     inf      of    the  density      f     by     g   ,      and   put      F(I)    =  /     dF   . 

Then   there   exists   a   constant      C        such    that 

o 

Pr{sups^-^/2    j-^   _  ^_^j    ^  C^a^^''^)    =   ]    -   o(l)    , 
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where  x  =  mean  of  observations  in   I 


y   =  /   xd}VF(I)  =  concliCicnnl  nu-an  of   F   on   I  , 
s  -   length  of   I  , 

and  the   sup   is  taken  over  all  open  intervals   I   (whose  boundary  points 
are  order  statistics)  containing  at  least   Not   '  g/16   observations. 
Proof  : 

« 

For   any     N  >   N      ,      consider   an   interval      I      of    the   form      (x,    ,,    x,   _^     ^^.)    , 

1/3     ,,, 
where      n^   >   Na„   '    g/lG    . 

Using  Lemma    7      (first   conditioning  on    the   two   order   statistics   and 

then  integrating  out),    we   obtain 

Pr{a   -1    n,l/2    |-^    -  y J    >   Cdog  N)^/2}    <  ^^-2    (^^g  i^)-l/2    ^  (A. 8) 

I  I  J.  i 

where     a   ^   =   /      (x   -  p    )2dF/F(I)    =  conditional  variance   of      F      on     I    . 

Now,    by   the  Taylor   series   expansion  of      f    , 

f(x)    =  f(mj)    +   (x   -mj)f^^\m^)    +  ^(x   -  '^j)  ^^  ^^^^j,)      ^""^   ""^^     x      in     I    , 

where     m      =  midpoint   of      I      and      £        is   between     x     and     m      . 
I  X  i 

Therefore 

F(I)    =    f(iiij)Sj[l    +   0(s^2)j    ^ 

(Note   that    the  universal   constant   in   the      0      term  depends   on   the  bound 
of   the   second   derivative   of      f    .) 
And   hence  (4.8)    can  be  written   as 

Pr{.l2     s^-l[l   +  0(?j2)]    n^l/2    |-^   _  ^j    -^  c(]os  :0^/^} 

5   DN-2(log   N)-^/^    . 
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Since   the   number   of    possil)le   intervals      I      is  bounded   by     N-^    ,      we  have 
Pr{sup    s^-l[l   +  0(s^2)|j,_l/2    |xj   -  uj    <  ■;7=  C(log   N)l/^-} 

>    1    -  DClog   N)-'/2    ^   I    _  q(j)    _ 
Now,    from  Lernma    8,      we  have    uniformly    in   I, 

n     >  — NF(1)   2  V''^'S^-,-  with   probability   tending   to   one. 

Therefore, 

Pr{sup    s  -1/2    |-     _  p    I    <   c   a^l/2}    ^   i  33  N  -^-  "   , 

^        I  '    I  1'  o   N  ' 

A      1/9 

where      C      =    /r  R  C   ,      and    the      sup      is    taken  over  all   intervals   of    the 

0/6 

fern,      (X(^),    x^^_^^_^^^^)      with     n^  >   Na^l/3g/16    . 
Theorem  ^   :  .  ■ 


Let     X.  ,    X   ,    . . . ,    X        be   a   random   sample   from      F   . 

Let      c  .  CN)       (j^l,    2,    .,.,    k  )      be    the   length   of    the   j th   cluster   of   a 
J  ^^ 

locally  optimal   k^ -par tition. 
N 

Then,    provided    that      k.,  =  o(a,~^'3)    ,      we  have 

N      N 

max   Ik  c.Cn')  f.^/'  -  f   ^    f(x)l/-^d>:|  =  o  (1)  , 

where   f .   is  the  density  at  the  midpoint  of  the  j th  cluster. 
Proof  : 


Consider  a  locally  optimal  k^ -partition  with  k^,  =  o(a   w3j  ^ 

N  N       N 

Denote  the  open  interval  (whose  boundary  points  are  order  statistics)  con- 
taining the  i th  cluster  by   I.  ,   and  let  its  midpoint  be  m.  . 

J  J 

Then,  as  before,  we  have 

1    f    (m . ) 

U.-fj      xdF/F(I.)  =  ra.  +  -i-  . ^  E.2  +  o(e.'^)  .        (4.9) 

J    Ij        J     J    12    f(m  )    J       J 
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Again,    to   avoid  word  Lncss,    s tatciiienL.s   ave.    to  be   read   as    ii=    they    incliidoi 
the   qualification:      "vn  th   probability   tending    to   one   as      N      approaclics 
infinity". 

[I]      Suppose    that      2{^-)^^^/k^>    c.c:l/(2k^)    ,      where     h   =   sup  f(x)    . 

Then,      F(I.)    ?   g/(2k„)    . 

By  LeinraaS,       the   jth   cluster    contains   at  least  -i^  _   0    (N^^^) 

2kj^  P 

observations.      Thus,    the   number    of   observations    in   the   jth   cluster 

exceeds     Ng/4k..      eventually. 

Since     k      =  o(ct   -   /   )    ,       this   nuir.ber      >   Na      /   g/4    . 

Applying  Lemma    y,      v/e   have 

|x.   -   y.l    5  C  a//^.V2  (4.10) 

where  x.  =  mean  of  obser\'ations  in  the  jth  cluster. 

Consider  the  (j-l)st  cluster,  a  cluster  adjacent  to  the  jth  cluster. 

Using  the  argument  given  in  part  I  of  the  proof  of  Theorea  2,   it 

can  be  shown  that 

1/3 
the  (j-l)st  cluster  contains  at  least  Na   '  g/16   observations,  (4.11) 

and  (m._^  +  ^c  ._^  -f  O^Ca^))  -  7._^  =  x.  -  (..  -  ie.  +  O^ia^))    .    (4.12) 

[II]   From  (4.9),   (4.12)   can  be  written  as 

,         i        (m.    ,  ■.  ^ 

^j-1  - 12  •  -T^  i-1  -  ^^^^u  '  ^3-1  -  Vl 

1    ^^^^(-i) 
=  X.  -  (p.  -  f^  •  -Fv-V-  c-^  -  0^^-'')  -  ^.)  +  20  (a  )  .  (4.13) 
J     J    12    f(m  )     J       J     2  J      p  i\ 

Let   f*  be  the  density  at  the  midpoint  between  m   and  m.  ,  . 

j        J-i 

Then 
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^ji'n--fo;rT-'^j-^°<^j'>  "^j'('-2 -Too 


(m.) 

J 


+  0(c?)] 


/  f "      \-l/3 
=   -•   TTTTT  [].    +  0(£.2)    +  0    (a J  ]    , 


since 


f*   =   f(m.)    -  ~r^\va.)c.   +  0(c.2)   +  0   (n^J      bv    the   expansion 
J  2  J      J  J  p      N  ■ 


of      f      about     111.    .      Similarly,    by    the   expansion   of      f      about     ni .    ,     , 


j-i- 


j-i 


1/3 


=   e.l-TT x-l         [1    +  OCr?    ,)   +   20    (a.,)] 


Thus     (4.13)   becomes 


1  /     f*        \-V3 


(x.    -  I,,)   +  -^£.  ,4-^Vl    '      U   +  0(e,2)]    +  20_^(aJ    .(^-IM 


But    from     (4.11),    we   can  apply   Lemma   9    to    the    (j-1)    cluster    to 

give 

I  —         I    ^    p         1/2    1/2 

l^'j-l    -  ^j-l'    -   Vn        ^j-1    •  (^-15) 

Therefore,   combining  (4.10),  (4.14)  and  (4.15),  we  can  first  show 

tne  ratio  c.  ,  /  £.      is  bounded,  and  then 
J-1    J 


jzi.  l!h^^ 


1/3 


\f(n.)  J 


-i     -  1 


ft*  y^^ 

"•^o'JOir.Ty    "N^/•^^-^/^  +  2e.-lO  (aJ  (4.16) 

+  0(e.2) 

k  -1[4/2"C  (ll)'^\,,^^^a,y  +^\.'0  (^m) 
N        OP     N    N        N   p   N 


+  o(k„-M] 
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=  k  -1  o  (1)  ; 
N     p 

and  this  bound  does  not  depend  on  the  intervals  involved. 

[Ill]   From  the  first  inequality  in  ('^•16),   we  can  show  by  contradiction 

that  at  least  one  of  the  li,  cluster  inter\'a]s  satisfies 

N 


om- 


Then  using  the  bound  in(4.1o)  and  carrying  out  at  most  '«>,  c 
parisons  of  adjacent  clusters,  we  obtain 

c.         /f(m.)^//^  k 

IT-  [TWJJ        -^'^\-'   Op(i)]^^  =  i+o^(i) 

uniformly  in   1  <  i.  i  S  k 

'^N  1 

1/3  1/3 

Since       E     e.    f(m.)  -»-  /      f(x)        dx      as      N  ^  «>   ^       the   theorem 

1  J  J  Q 

f  ollov;s. 


Next,    we  will   assume    that     F      is   four    times   dif f erentiable   and    show 

that   the  Xv'ithin   cluster   sums   of    squares   are   asjnnptotically  equal. 

Lemma   10: 

Let      z   ,    z    ,    ...      be   i.i.d,    random  variables   from  some  distribution  with 

finite   fourtli  moment      y    . 

Put      a     -  log  N/N     and   let     var(z   )    =  a^    . 

Then   there  exist   constants      C'      D'      and      N    '    ,      such    that   if      N  >   N    ' 

o      '  ~      o 

1/3 
and      n  >   Ka„   '    /16    , 
N 

Pr{Y~^^^"n-^^^    |z(z.    -  7)2    _  na2|    >   C'(log   N)^''^}   5  D'N-2(log   N)~^^^    . 
1      ^ 

(The   proof    is   similar    to    that   of   Lamma  5) 

L  emm  a    11- 

Let     X   ,    X    ,    ...,    X        be   a   random    sample   frora     F    . 
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Tlicn  there  exists  a  constant  C  '   such  that 

o 

where  ^  -,    -   length  of   I  ,   '^t  ~  ciidpoint  of   I  ,   VJSS   =  within  cluster 
sum  of  squares  of  thf  observations  in   I  ,   and  the   sup   is  taken  over 

all  open  intervals   I   (whose  boundary  points  are  order  statistics)  con- 

1/3 
tainine  at  least  Na^,   g/16   observations. 

Proof: 


For  any  N  >  N  '  ,   consider  an  interval   I   of  the  form   (x,  .,  x,  ,   ,,v)  . 
■'        o  (m)    (m+n  +1)   ' 

1/3  ,  ■'■ 

where  n  >   Na    g/16  . 

Using  Leiru-na  10  (first  conditioning  on  the  two  order  statistics  and 
then  integrating  out),  we  obtain 

r{Yj~^^^n^~^^^  IwSSj  -  n^a^^\    >  C (log  N)'^^}  5  D'N-2(log  N)-^/^  ,(4.17) 


Pr^ 


whei 


re  v^   =  /j  xdF/F(I)  ,  a ^^    =  /^(x  -  y^)2dF/F(I)  ,   and  yCD  = 
/j(x  -  y^)'^dF/F(I)  , 
■Now  by  the  Taylor  series  expansion  of   f  , 

f(x)  =  f(mj)  +  (x-mj)f^^^ra^)  +  ^(x-ni^)  2f  ^^^  (m^) 


+  ^ix-:n^)H^^\n^)    +   ^(x-n^)"  f  ^'*\y 
where  £    is  between  x  and  m^  . 

X.I 

Therefore, 

F(I)  =  f(mj)sj[l  +  0(sj2)]  ^  (^^18) 

aj2   =i^  s^2[i  4-  0(s^2)]   , 

V   =   ^   s  '*  r  1  +  0  (  s  2)1 
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and 


(NolG    tlu-it    Liie   universal    constciiit.    in    tho      0      tcr.n  depends   on    Lhe   bounds 

of    the  derivatives    oc      f    .) 

And   hence    (A.  17)    can   be  written   as 

Pr{v/18^    s^-2[l    +   0(s^2)]n^-l/2    jygg^    __i_n^s^2fl    +   0(s^2)]| 

>    C'(log    N)l/2}    ^   D'N-2(log    N)~    ^''^-    . 
Since    the   number   of    possible   intervals      I      is   bounded  by      n2    ^      we  have 

Pr{sup    Sj-2nj-^/^    1V:SS^   "  A"l\^^^   ^  0(.s^2)j| 

2 

-7^  c'Ciog  :;)^/^}  =  1  -  o(i)   . 

Now,    from  Lemma    8  and    (4.18)  , 

n^    =  NF(I)    -1-  0    (N^^^)    =   Nf(m    )s  [1    +   0(s   2)]    +   q    (N^''^)    , 
1  p  I      I  I  p 

Therefore, 

n     5   21'Jh  s      v;ith   probability    tending   to   one    (h   =  sup   f)    ,      and 

Pr{sup     £.j~^^^    |USSj   -  ^Nf(m^)s^3[i    +  o(s^2)j| 

2        J  /2 
where     C    '    =     nr^  h        C'  and    the      sup      is    taken  over   all   intervals   of 

o  /90  ' 

the   form      (x^^^,    x^^^^^^^^^)      with      n^   >   Naj^^'g/IG    . 
Theorem   5: 


Let     X   ,    X    ,    . .  .  ,    X.,     be   a   random   sample   from      F    . 

Let     U'SS  .  (N)       (i=l,    2,    ....    k^,)      be    the  within  cluster    sum    of    squares    of 
J  N 

the  jth   cluster   of    a   locally   optimal   k  -partition. 
Then,    provided    that      k     =  o(a   ~   '^)    ,      we   have 

max         |l2:rU:    3   v;3S.(i;)    -    (/-^f(x)^/^dx)3|    =0    (1)    . 
I<j5k  N  J  0  '  P  . 


N 
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Proof. 

Consider   a   localJy    cptiiaal    l;^,-part:i  t  ion  with     k^,   ^   o    (ex  ~    '    )    . 

is  N  p      N 

It   is   shov/n  in  Theorem   4    that,        for   all      N      large   enough  v.'ith   probability 

tending   to   one, 
1  .      the   number   of   observations   in   each  cluster     >   Na      '    g/16   ,    and 
2.     c.    -   e.(N)    =  kj^-1    f(m.)-^/^   G[l   +   o    (1)]    ,      where      G   =   /^^f(x)'/lix    . 

From    (1),    we   can  ajiply  Lcmina    11    to   obtain 

c.-^/2    |V.'SS.(N)    -  --L  Nf(m.)E.^[l    H-  0(c.2)]|£   C    'Na//' 
J  J  12  jj  j'oN 


unifonnly  in  1  <  j  2  k.. 


From  (2),  we  have  uniformly  in   1  5  j  S  k 


IWSS.(N)  -  ~   Nk  -3g3[1  +  o  (1)]| 


<  20  'N.^/'k  -^/2g-^/°GV2  . 
O    N     N 

Therefore, 

|12N-Ik,3  WSS.(N)  -  g3|  5  o  (1)  +  C*k,//^a„^/^ 
'       N     J  P         N    N 

(where  C*  =  20  'g-^/^G^/^) 

=  Op(l)  . 

(Remark:   As  before,  since  the  global  optimum  is  necessarily  locally 
optimal,  the  results  of  Theorem  A    and  Theorem   5  also  apply  to  the 
globally  optinal  k  -partition.   Moreover,  the  generalization  to  densities 
with  finite  support   [a,b]   is  immediate.) 
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5.   WEAK  UNIFORiM  CONSISTENCY  OF 
THE  KISTOGRA.M  ESTIRMES 

In  this  section,  we  vjill  investigate  the  asymptotic  properties 
of  the  k-means  procedure  as  a  density  estimation  technique.   Let 
X,,  X.,,....,  Xj,   be  a  random  sample  from  some  population  F  on  [a,  bj. 
Suppose  that  the  density  f  is  four  times  dif f erentiable  and  is  strictly 
positive  on  [a,b  J  . 

Consider  a  locally  optimal  k  -partition. 

Let   a  -  yp,^^^    <    y,  (-O  <  ■''  y,  (N)  =  b  be  the  outpoints  of  the  partition; 

the  cutpoint  betv.'cen  two  clusters  is  defined  to  be  the  midpoint  between 

the  cluster  means. 

Denote   /  ^  f(x)^''"d;:  by  G  . 

Then  from  Theorem  4,   Theorem  5,  and  Lemma  8  ■  respectively,  we  have 
uniformly  in  -^  1  J  1  ^,  > 

e.  =  G\^^   f."'/'(l  +  Op  (D)  (5.1) 

WSS.  =  -4^  G^  Nl^~^   (1  +  o  (D)  (5.2) 

J     1/       1^  p 

n.    =  Nf .  E.   (1  +  0  (  c  .^))  +  0   (n''^)  •  (5.3) 

2        J   J  J       P 

Therefore,  subsLituting   (5.1)  in  (5.3)  gives 

u.    =  GNkj^-lf^V^  (1  +  Op(l))  .   ^  (5.4) 

Define  the  density  estimate  (Estimate  I)  at  a  point   x   by 
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fj,(x)  =  u/'/'''/?;(12WSS  )'/^  y .  ,  2  X  <  y  •  1  <  j  s  k,,  . 

Then,  from  (5.2)  and  (5.4), 

fjj(x)  =  G'/'nyh^^-yh.    (1  +  Op(l))/G3/^,3/\^-3/2  (1^  „  (1)) 

=  f.  (1  +  o  (1))  uniformly  in   1  5  i  ?  k,  . 

J       P  -^    N 

Since   f   is  uniformly  continuous,  we  have  shown  the  following. 

Theorem  6: 

sup   |f  (x)  -  f(x)|  =  o  (1)  . 
a<x<b   ^'  P 

Moreover,  from  (5.^),   v;e  have 

n.  +  n    =  2GKk  -If  .^/^  (1  +  o  (1))  . 

And  since  7  -x.    =~(e.  +  c.    ,)  (1  +o  (1))  =c.(l  +o  (1))  , 
we  can  obtain  from  (5.1),  (5.2)  and  (5.5)^ 

USS  *   =  WSS   +  USS   ,  +  7-  (n   +  n  .  ,  )  (x .  -  x .  ,  )  ?-    (by  definition) 

=  J.[2G3Kk  -3  (1  H-  o  (D)  +  6G3Kk  -3  (1  +  o  (1))] 
12       N  p  N  p 

=  i-[8G3Nk  -3  (1  +  o  (1))]  .        • 
Iz       N  p 

Define  the  pooled  density  estimate  (Estimate  II)  at  a  point  x  by 
^M  (x)    =   (n.  +  n.  ,)^/Vn(12WSS.-)^''^   7 


(5.5) 


N 


(n.  +  n.  ,)^/VN(12WSS  .-)''%   x     <  x  S  x.     (2  5  j  5  k^,)  ; 


(n^  +  n^)^/VN(12WSS2-)^/^        a  5  x  5  x  ^; 

(n^   +  n^   _^)^/^/N(12WSSj^  *)    ,  ^  <  x  ^  b  . 

N     N~  N         N 


Then. 


f  *(x)  =  f .  (1  +  o  (D)         uniformly  in   1  5  j  5  k^, 
N        J       p  N 

And.  hence,  from  the  uniform  continuity  of   f  ,   we  have 

Corollary  7  : 

sup   |f/(>:)  -  f(x)i  =  o  (1)  . 
a<x<b  ^ 
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6.   CONCLUDING  REMARKS 

In  constructing  a  histogram  to  estimate  an  unknown  density  function 
which  vanishes  outside  the  finite  interval   [a,  b],  the  results  in 
Section  4  indicate  that  the  k-means  procedure  would  partition  [a,  b]  in 
such  a  way  that  the  sizes  of  the  intervals  are  adaptive  to  the  underlying 
density;   the  intervals  are  large  where  the  density  is  low  while  the 
intervals  are  small  where  the  density  is  high.   Thus  k-raeans  can  be  regard- 
ed as  a  useful  tool  for  generating  variable  cell  histograms;   indeed,  the 
two  estimates  given  in  Section  5  are  shown  to  be  uniformly  consistent  in 

_„^V  „1,  ^  1  ^'  f  ,T       TT^TT^-,*^-*-     -;  ♦-   r'V,^,.1^   Ko   t-v -^ -i  ■.-.*-  /^  J   ^.,f-   t-Tr^*-   ^*-\^^-^        ■-'  v-» +-  '^>-r^O  *--?  T-.  r> 
prCC„.> J.  ..>,..>..  _^,   -^   ^- w-   t->_..w_^   --W   ^.—  _   -,  w.. .^ww^-^.lO 

large  sample  properties,  like  the  mean  squared  error  and  the  rate  of 
convergence,  of  the  estimates  have  not  been  considered. 

A  major  difficulty  of  the  usual  histogram  is  that  when  multivariate 
histograms  are  constructed  by  partitioning  the  sampled  space  into  cells 
of  equal  size,  there  are  too  many  cells  with  very  few  observations.   One 
desirable  feature  of  the  k-means  procedure  is  that  it  provides  a  practi- 
cable and  convenient  way  of  obtaining  a  k-partition  of  the  multivariate 
data  or  equivalently,  the  multidimensional  sampled  space.   Consequently, 
histogram  estimates  of  the  density  over  t'^ese  k  cells  or  regions  can 
easily  be  obtained.   Unfortunately,   the  proofs  of  the  theorem  for  the 
univariate  case  cannot  be  easily  generalized  to  the  multivariate  case. 
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Much  work  has  yet  to  be  done  to  investigate  the  asymptotic  properties 

of  k-means  partition  of  samples  from  two  or  more  dimensional  distributions. 

Finally,  some  results  of  an  empirical  study  of  the  density  estimates 
proposed  in  Section  5  are  reported  in  Hartigan  and  Wong  (1979b).   In 
general,  the  numerical  results  obtaiiaed  in  the  study  provide  an  empirical 
validation  of  the  asymptotic  properties  derived  here;   for  details,  see 
Wong  (1979).   Two  examples  shoving  the  performance  of  Estimate  II  are 
given  in  Figure  A  and  Figure  B;   it  should  be  pointed  out  that  this  pooled 
estimate  consistently  outperfoirms  Estimate  I  in  the  empirical  study. 
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